Mark Gameng

CS 350: HW 1

**1.2** **– a)** Performance via pipelining

**b)** Dependability via redundancy

**c)** Performance via prediction

**d)** Performance via parallelism

**e)** Make the common case fast

**f)** Hierarchy of memories

**g)** Design for Moore’s Law

**h)** Use abstraction to simplify design

**1.5 – a)** IPS = instructions / second = clock rate / cpi

P1, IPS = 3GHz/1.5CPI = 2 \* 10^9

P2, IPS = 2.5GHz/1.0CPI = 2.5 \* 10^9

P3, IPS = 4.0GHz/2.2CPI = 1.81 \* 10^9

Processor 2 has the highest performance in instructions per second

**b)** # cycles = Clock rate \* time, # instructions = IPS \* time

P1, cycles = 3.0GHz \* 10s = 3 \* 10^10 instructions = 2 \* 10^9 IPS \* 10s = 2 \* 10^10

P2, cycles = 2.5GHz \* 10s = 2.5 \* 10^10 instructions = 2.5 \* 10^9 IPS \* 10s = 2.5 \* 10^10

P3, cycles = 4.0GHz \* 10s = 4 \* 10^10 instructions = 1.81 \* 10^9 IPS \* 10s = 1.81 \* 10^10

**c)** (1.2 \* Clock rate(old)) / Clock rate(new) = 0.7

Clock rate(new) = (1.2/.7) \* clock rate old. So, 1.2/.7 = 1.71, 71% increase in clock rate

P1, 3GHz \* 1.71 = 5.13 GHz

P2, 2.5GHz \* 1.71 = 4.275 GHz

P3, 4GHz \* 1.71 = 6.84 GHz

**1.6** – 10^6 instructions, 10% on A, 20% B, 50% C, 20% D

P1, Clock cycle = 1 \* 10^5 + 2 \* 2 \* 10^5 + 3 \* 5 \* 10^5 + 3 \* 2 \* 10^5 = 2.6 \* 10^6

P2, Clock cycle = 2 \* 10^5 + 2 \* 2 \* 10^5 + 2 \* 5 \* 10^5 + 2 \* 2 \* 10^5 = 2 \* 10^6

CPU time = clock cycle / clock rate

P1, CPU time = 2.6 \* 10^6 / 2.5GHz = 0.001 seconds

P2, CPU time = 2 \* 10^6 / 3GHz = 6.6 \* 10^-4 seconds

P2 is faster than P1

**a)** CPI = clock cycle / number of instructions

P1, CPI = 2.6 \* 10^6 / 10^6 = 2.6

P2, CPI = 2 \* 10^6 / 10^6 = 2

**b)** The calculations are above for the clock cycles. P1, clock cycle = 2.6 \* 10^6 and P2, clock cycle = 2 \* 10^6

**1.9** – CPI, 1, 12, and 5. 2.56E9 arithmetic, 1.28E9 load/store, 256 million branch. 2GHz., arithmetic and load /.7 \* processors

**1.9.1**) Clock Cycle (1 processor) = 1 \* 2.56E9 + 12 \* 1.28E9 + 5 \* 2.56E8 = 1.92E10

Clock cycle(p processor) = ((1 \* 2.56E9 + 12 \* 1.28E9)/(0.7 \* p)) + 5 \* 2.56E8

CPU time = clock cycle / clock rate

1 Processor = 9.6 seconds execution time

2 Processors = 7.04 seconds, 1.36 relative speedup

4 Processors = 3.84 seconds, 2.5 relative speedup

8 Processors = 2.24 seconds, 4.28 relative speedup

**1.9.2**) CPI of arithmetic doubled. Same calculations as above except CPI of 2 for arithmetic

1 Processor = 10.88 seconds, 13% slower

2 Processors = 7.95 seconds, 13% slower

4 Processors = 4.29 seconds, 11% slower

8 Processors = 2.47 seconds, 10% slower

**1.9.3**) Change CPI of load for a single processor to match performance of 4 processors using original CPI

1 \* 2.56E9 + X \* 1.28E9 + 5 \* 2.56E8 = ((1 \* 2.56E9 + 12 \* 1.28E9)/(0.7 \* 4)) + 5 \* 2.56E8

X = 3, CPI for load/store should be 3

**1.11** – 2.389E12 instruction, 750s execution, reference time 9650 seconds

**1.11.1**) CPI = execution time / (instruction count \* clock cycle time) = 750 / (2.389E12 \* 3.3E-10) = 0.95

**1.11.2**) SPECratio = reference time / measured time = 9650 / 750 = 12.866

**1.11.3**) 10% increase in instructions – 1.1 \* 750 = 825 seconds

**1.11.4**) 10% increase in instructions, 5% CPI – 1.1 \* 1.05 \* 750 = 866.25 seconds

**1.11.5**) SPECratio = 9650 / (1.1 \* 1.05 \* 750) = 11.139

**1.12** – P1, 4GHz, CPI 0.9, 5E9 instructions; P2, 3GHz, CPI .75, 1E9 instructions

**1.12.1**) Clock cycle = CPI \* instructions

P1, CPU time = (.9 \* 5E9)/ 4GHz = 1.125 seconds

P2, CPU time = (.75 \* 1E9)/ 3GHz = .25 seconds

Not true for P1 and P2, P2 has a better performance even though it has less clock rate.

**1.12.2**) P1, (.9 \* 1E9) / 4GHz = .225 seconds, .225 / .25 = .9

0.9 \* 10^9 = 9E8 instructions that P2 can execute the same time that P1 needs to execute 1E9 instructions

**1.12.3**) MIPS = instructions / (execution time \* 10^6)

P1, MIPS = 4444

P2, MIPS = 4000

P1 MIPS > P2 MIPS but P2 has a better performance

**1.12.4**) MFLOPS = No. FP operations / (execution time \* 10^6), 40% instructions

P1, FP operations = 5E9 \* 0.4 \* 0.9 = 1.8E9, Exec time = 1.8E9/4GHz = .45

P2, FP operations = 1E9 \* .4 \* .75 = 3E8, exec time = 3E8/3GHz = 0.1

P1, MFLOPS = 1.8E9 / (.45 \* 10^6) = 4000

P2, MFLOPS = 3E8/ (.1 \* 10^6) = 3000

**1.15**) 100 seconds for 1 processor. 4 seconds overhead if multiple processors

|  |  |  |  |  |
| --- | --- | --- | --- | --- |
| # of Processors | Execution time | Total Time | Relative Speedup | Ratio |
| 1 | 100 | 100 |  |  |
| 2 | 50 | 54 | 1.85 | .92 |
| 4 | 25 | 29 | 3.45 | .86 |
| 8 | 12.5 | 16.5 | 6.06 | .76 |
| 16 | 6.25 | 10.25 | 9.76 | .61 |
| 32 | 3.125 | 7.125 | 14.04 | .44 |
| 64 | 1.5625 | 5.5625 | 17.98 | .28 |
| 128 | .78125 | 4.78125 | 20.91 | .16 |